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Abstract 

We  present  a  framework  for  modeling  the  spread  of  pathogens  throughout  a  pop¬ 
ulation  and  generating  policies  that  minimize  the  impact  of  those  pathogens  on 
the  population.  This  framework  is  used  to  study  the  spread  of  human  viruses  be¬ 
tween  cities  via  airplane  travel.  It  combines  agent-based  simulation,  mathematical 
analysis,  and  an  Evolutionary  Algorithm  (EA)  optimizer.  The  goal  of  this  study 
is  to  develop  tools  that  determine  the  optimal  distribution  of  a  vaccine  supply  in 
the  model.  Using  plausible  benchmark  vaccine  allocation  policies  of  uniform  and 
proportional  distribution,  we  compared  their  effectiveness  to  policies  found  by  the 
EA.  We  then  designed  and  tested  a  new,  more  effective  policy  which  increased  the 
importance  of  vaccinating  smaller  cities  that  are  flown  to  more  often.  This  “im¬ 
portance  factor”  was  validated  using  U.S.  influenza  data  from  the  last  four  years. 

Key  words:  epidemiology,  evolutionary  algorithm,  influenza,  migration, 
vaccination. 


1.  Introduction 

At  the  beginning  of  each  influenza  season,  there  are  widespread  concerns  in 
the  U.S.  about  the  supply  and  distribution  of  the  latest  vaccine.  Public  fear  is  fu¬ 
elled  by  stories  of  ineffective  influenza  vaccine  batches,  a  limited  availability  of 
the  vaccine,  and  deadly  new  strains  that  could  possibly  become  pandemic.  Iden¬ 
tifying  the  urgent  need  for  an  effective  vaccination  policy,  this  paper  presents  a 
novel  framework  for  modeling  the  spread  of  pathogens  throughout  a  population 
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and  a  way  to  generate  and  evaluate  policies  that  minimize  the  impact  of  those 
pathogens  on  the  population.  This  framework  is  a  combination  of  agent-based 
simulation,  mathematical  analysis,  and  a  sophisticated  optimization  tool. 

The  objective  of  our  model  is  to  study  of  the  spread  of  human  viruses  between 
cities  via  air  travel.  Previous  continuous  models  of  the  spread  of  disease  between 
cities  include  Rvachev  and  Longini  [26],  Hyman  and  LaForce  [16],  Colizza,  et  al. 
[9],  and  Grais,  et  al.  [11].  These  types  of  epidemiological  models  use  differential 
equations  with  mass  action  [1,  20,  25]  or  migration  [2,  18]  coupling  terms  be¬ 
tween  subpopulations.  Another  approach  is  in  agent-based  simulations  of  disease 
spread,  such  as  [21,  10].  While  these  models  fold  the  transport  between  weakly 
connected  subpopulations  into  the  existing  model,  we  layer  the  model  into  two 
parts.  We  use  an  agent-based  simulation  to  model  the  movement  between  the 
subpopulations  and  differential  equations  to  govern  the  state  of  the  agents  within 
each  subpopulation.  The  complete  simulation  is  parallelized  by  running  each  is¬ 
land  (subpopulation)  on  a  separate  computer  processor. 

We  extend  the  model  to  include  a  supply  of  vaccine,  which  is  to  be  distributed 
in  an  optimal  fashion.  By  mathematical  analysis  of  this  model,  we  determine  the 
feasibility  of  a  vaccination  policy.  Using  a  sophisticated  in-house  optimization 
toolkit  [27,  28],  based  on  Evolutionary  Algorithms  (EA),  we  search  for  vacci¬ 
nation  policies  that  minimize  the  number  of  infected  people.  We  compare  these 
policies  to  plausible  benchmark  policies  to  verify  that  the  EA  policies  are  more 
effective.  Analysis  of  the  EA  policies  indicates  that  the  vaccine  is  generally  dis¬ 
tributed  to  (1)  the  city  of  origin  for  the  virus,  (2)  cities  that  are  traveled  to  most 
often,  and  (3)  smaller  cities.  The  latter  observation  is  most  surprising  and  counter¬ 
intuitive. 

Finally,  we  design  a  new  benchmark  policy  to  improve  upon  the  EA  results, 
based  on  the  three  observations  given  above.  This  new  policy  is  superior  to  all 
of  our  benchmark  policies  and  the  EA-generated  policies.  The  key  to  the  perfor¬ 
mance  is  an  “importance  factor”,  which  emphasizes  the  vaccination  of  small  cities 
that  are  flown  to  most  often.  Confirmation  of  this  “importance  factor”  is  provided 
by  real-world  data  that  monitored  the  spread  of  the  influenza  virus  during  the  win¬ 
ters  of  the  last  four  years.  We  believe  this  new  policy  is  quite  general.  However, 
if  the  model  changes  sufficiently,  new  policies  will  eventually  be  required.  In  that 
case  the  EA  is  always  available  to  guide  the  human  in  the  creation  of  the  new 
policies  for  changing  situations. 
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2.  The  Model 


Our  virus  spread  model  has  two  basic  levels  -  an  inter-city  level,  called  the 
“micro-level”,  and  the  intra-city  level,  called  the  “macro-level”  or  air-travel  model. 
The  micro-level  model  is  governed  by  differential  equations  (the  island  part  of  the 
model,  e.g.,  see  [4,  30]).  Connecting  the  islands  is  the  macro-level  model,  using 
a  Markov  chain  probability  transition  matrix,  that  simulates  the  intra-city  traffic 
by  airline  flights  (the  weakly-connected  part  of  the  model).  The  combination  of 
levels  allows  us  to  model  roughly  55  million  people  within  the  United  States. 


2.1.  The  Micro-Level  Model 

To  simulate  the  disease  spread  within  a  city,  we  use  a  compartmental  model 
that  assumes  the  population  transitions  between  Susceptible,  Asymptomatic,  In¬ 
fected,  and  Recovered  states,  called  the  SAIR  model.  The  Susceptible,  Infected, 
and  Recovered  states  are  standard  states  of  health  used  in  SIR  models.  We  also  use 
an  Asymptomatic  state,  which  represents  the  period  of  time  after  initial  infection 
that  a  person  in  unaware  of  the  infection  (due  to  a  complete  lack  of  symptoms) 
but  is  contagious  [13].  We  also  assume  that  a  person  in  that  state  may  fly  in  the 
macro-level  model,  while  an  infected  person  can  not  fly. 

The  variables  that  represent  the  proportion  of  the  city’s  population  in  each  state 
are:  the  susceptible  state  s,  the  asymptomatic  state  a,  the  infected  state  i,  and  the 
recovered  state  r.  The  proportions  are  changed  each  time-step  of  the  simulation 
according  to  the  following  system  of  differential  equations: 


ds 

dt 

da 

dt 

dl 

dt 

dr 

dt 


8'r—  as 
as  —  f la 
I ua  —  8i 
8i—  8'r. 


(1) 


The  parameters  a,  /i,  8,  and  8'  are  the  probabilities  of  an  individual  transitioning 
between  states,  on  a  daily  basis.  We  can  think  of  a  as  the  infection  rate,  1//1  as 
the  expected  asymptomatic  time  of  the  virus  (how  long  an  individual  is  contagious 
before  showing  symptoms),  1/5  as  the  expected  length  of  time  a  person  spends 
infected  until  recovery,  and  l/5;  as  the  expected  time  it  takes  for  an  individual  to 
lose  resistance  to  the  virus.  The  population  in  the  absence  of  air  travel  is  assumed 
to  be  constant  and  we  have  the  relationship  s  +  a  +  i  +  r  =  1.  The  parameters  /i, 
5,  and  8'  are  constants.  Figure  1  gives  a  diagram  of  the  health  states  and  the 
transitions  between  them. 
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Figure  1 :  S AIR  State  Diagram 

The  infection  rate  (a)  is  related  to  the  proportion  of  the  population  that  can 
infect  an  individual  (i.e.  a  and  i).  The  probability  of  an  individual  being  infected  is 
related  to  the  number  of  people  the  individual  interacts  with  ( b )  and  the  probability 
the  individual  will  be  infected  from  an  encounter  with  an  infected  person  (/3). 
Since  /3  is  the  probability  of  being  infected  by  an  infected  neighbor,  1  —  /3  is  the 
probability  of  not  being  infected  by  an  infected  neighbor.  We  know  a  +  i  is  the 
proportion  of  the  population  that  is  contagious  and  b  is  the  number  of  neighbors  a 
person  has,  so  there  are  approximately  ( a  +  i)b  infected  neighbors  for  a  particular 
person.  This  yields  a  probability  of  not  being  infected  of  (1  —  /3 )(a+0^  But  since 
we  are  interested  in  the  probability  of  being  infected,  we  use  the  complement  of 
the  probability  of  not  being  infected  for  a.  This  yields  the  following  equation 
which  can  be  interpreted  to  be  the  probability  an  individual  will  be  infected  by  at 
least  one  of  his/her  b  neighbors: 

a  =  l-(l-/3)(fl+'>.  (2) 

In  the  simulation,  we  use  this  expression  to  compute  the  infection  rate. 

The  constant  parameters  fi,  <5,  and  S'  are  assumed  to  be  the  same  for  each 
city.  The  infection  rate  (a)  is  the  only  variable  that  could  change  between  cities, 
because  it  depends  on  b  (which  is  related  to  the  population  density).  Our  airports 
are  in  large  U.S.  cities,  so  there  shouldn’t  be  significant  changes  in  b.  Further,  the 
ODE  model  is  not  sensitive  to  small  changes  in  a  (the  dynamics  are  qualitatively 
similar)  and  the  infection  spread  will  just  slightly  speed  up  or  slow  down.  There¬ 
fore,  small  changes  in  that  parameter  should  not  improve  the  accuracy  drastically. 
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We  continue  with  a  brief  overview  of  the  steady  state  solutions  of  this  system 
and  their  stabilities.  These  fixed  points  will  be  written  in  the  form  (, s,a,i,r ).  For 
the  analysis  of  the  behavior  of  the  differential  equations,  we  use  the  following 
approximation  for  small  /3  and  ( a  +  i)b, 

a  ~  1  —  (1  —  j5b(a  +  i))  =  /3 b(a  +  i).  (3) 


The  disease  free  equilibrium  (DFE),  which  represents  the  die  out  of  the  dis¬ 
ease,  is  the  point  (1,0, 0,0).  A  common  method  of  determining  its  local  stability 
is  linearizing  the  system  about  the  steady  state.  The  resulting  eigenvalues  are 


Ai  =  0 

A2  =  -S' 

^  =  pb-(8+^)+y/(pb-(8+^y-+4(l5b(8+n)-n8)  (4) 

i  _  pb—(8+ir)—^/{Pb—(8+fi))2+4(Pb(8+^i)—^8) 


We  have  local  stability  when  these  eigenvalues  are  nonpositive,  or 

The  other  steady  state  solution  represents  the  endemic  equilibrium  (se,  a 
which  is  given  by 

_  v-8 

Se  ~  pb(8+n ) 


ae 

ie 

re 


i+S+ 

8 

8'  ' 


A 

S' 


<  1. 

e->  iei  re)t 


(5) 


To  find  the  replacement  number  R ,  we  follow  Hethcote,  et  al.  [12].  Assuming 
that  a  person  is  contagious  in  the  asymptomatic  and  infected  states,  a  person  is 
infective  for  about  1/jU  +  1/5  days.  If  a  person  is  contagious,  he/she  will  infect 
approximately  /3 b  people,  so  we  can  estimate  R  as  follows: 


R  =  =  I.  (6) 

lib  se 

When  R  <  1,  the  virus  dies  off.  This  agrees  with  the  local  stability  analysis  of  the 
DFE.  When  R  >  1,  the  virus  spreads. 


2.2.  The  Macro-Level  Model 

The  micro-level  model  for  the  cities  provides  the  island  part  of  the  weakly 
connected  island  modeling  framework,  but  we  need  to  provide  the  connections 


5 


between  the  islands.  We  are  interested  in  the  effect  of  airplane  travel  between 
cities  as  a  means  of  spreading  a  virus.  To  model  this,  we  searched  for  data  on 
airplane  travel  in  the  United  States  to  figure  out  which  cities  have  the  biggest 
airports.  As  a  first-order  approximation,  we  concentrated  on  the  the  top  12  U.S. 
airports.  Table  1  shows  the  12  cities  chosen  ordered  by  population  size  with  each 
city’s  yearly  total  passengers. 


City 

Pop.  Size 

Passengers/Year 

%  of  Total 

Las  Vegas 

1.314 

35.009 

6.742 

Denver 

1.985 

35.651 

6.866 

Minneapolis 

2.389 

32.628 

6.284 

Phoenix 

2.907 

35.547 

6.846 

Atlanta 

3.500 

76.876 

14.806 

Houston 

3.823 

33.905 

6.530 

Detroit 

3.903 

32.477 

6.255 

Dallas 

4.146 

52.828 

10.174 

San  Francisco 

4.767 

31.456 

6.058 

Miami 

4.919 

30.060 

5.789 

Chicago 

8.307 

66.565 

12.820 

Los  Angeles 

13.296 

56.223 

10.828 

Total 

55.256 

519.229 

100.000 

Table  1:  Cities  Ordered  by  Population  Size  (in  millions) 

We  used  this  information  to  estimate  the  percentage  of  passengers  for  each 
city,  which  yielded  a  initial  probability  distribution  for  allocating  planes  to  cities. 
We  then  computed  the  frequency  of  flights  between  the  cities.  We  booked  non¬ 
stop  flights  between  each  city  on  a  particular  Wednesday  to  estimate  the  number 
of  flights  between  each  city  on  http://www.expedia.com.  Since  there  are  twelve 
cities,  we  booked  132  (11  x  12)  different  flights  and  counted  the  number  of  non¬ 
stop  flights  between  the  cities.  Table  2  shows  the  number  of  flights  between  each 
city. 

To  convert  the  number  of  flights  into  transition  probabilities,  we  divide  each 
entry  in  the  matrix  in  Table  2  by  its  row  sum,  yielding  the  probability  Qij  of 
traveling  from  the  row  city  i  to  the  column  city  j.  Table  3  shows  the  resulting 
2-matrix.  Note  that  the  table  is  not  symmetric  about  the  diagonal,  so  Qjj  /  2/7  • 
Q  is  ergodic  and  has  an  equilibrium  distribution. 

For  the  twelve  cities,  the  number  of  daily  flights  is  1979,  (see  Table  2).  Ac¬ 
cording  to  the  fleet  composition  of  United  Airlines  we  assume  that  each  plane  can 
carry  between  160  and  300  passengers.  Assuming  that  the  average  flight  carries 
roughly  130  passengers  (at  least  43%  to  82%  of  capacity),  this  yields  about  95 
million  passengers  per  year.  Since  we  are  modeling  roughly  18%  of  the  airport 
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City 

Atl 

Chi 

LA 

Dal 

Den 

Pho 

LV 

Hou 

Minn 

Det 

SF 

Mia 

Total 

Atl 

0 

21 

18 

23 

23 

14 

10 

22 

17 

16 

15 

19 

198 

Chi 

20 

0 

18 

22 

18 

20 

18 

19 

22 

22 

22 

19 

220 

LA 

17 

18 

0 

21 

18 

21 

16 

15 

12 

7 

17 

14 

176 

Dal 

21 

22 

20 

0 

19 

18 

19 

18 

17 

14 

21 

9 

198 

Den 

22 

21 

17 

20 

0 

20 

20 

18 

21 

10 

18 

8 

195 

Pho 

14 

20 

19 

20 

20 

0 

11 

16 

16 

12 

22 

2 

172 

LV 

11 

17 

16 

19 

20 

11 

0 

7 

8 

7 

18 

3 

137 

Hou 

23 

18 

15 

16 

18 

16 

15 

0 

9 

9 

13 

8 

160 

Minn 

18 

22 

11 

17 

20 

16 

8 

9 

0 

15 

8 

3 

147 

Det 

17 

21 

7 

14 

10 

12 

7 

9 

14 

0 

3 

6 

120 

SF 

12 

22 

18 

20 

19 

22 

17 

13 

8 

3 

0 

6 

160 

Mia 

20 

17 

14 

9 

9 

2 

3 

8 

3 

6 

5 

0 

96 

Total 

195 

219 

173 

201 

194 

172 

144 

154 

147 

121 

162 

97 

1979 

Table  2:  Number  of  Flights  Between  Each  City 


Atl 

Chi 

LA 

Dal 

Den 

Pho 

LV 

Hou 

Minn 

Det 

SF 

Mia 

Atl 

0.000 

0.106 

0.091 

0.116 

0.116 

0.071 

0.051 

0.111 

0.086 

0.081 

0.076 

0.096 

Chi 

0.091 

0.000 

0.082 

0.100 

0.082 

0.091 

0.082 

0.086 

0.100 

0.100 

0.100 

0.086 

LA 

0.097 

0.102 

0.000 

0.119 

0.102 

0.119 

0.091 

0.085 

0.068 

0.040 

0.097 

0.080 

Dal 

0.106 

0.111 

0.101 

0.000 

0.096 

0.091 

0.096 

0.091 

0.086 

0.071 

0.106 

0.046 

Den 

0.113 

0.108 

0.087 

0.103 

0.000 

0.103 

0.103 

0.092 

0.108 

0.051 

0.092 

0.041 

Pho 

0.081 

0.116 

0.111 

0.116 

0.116 

0.000 

0.064 

0.093 

0.093 

0.070 

0.128 

0.012 

LV 

0.080 

0.124 

0.117 

0.139 

0.146 

0.080 

0.000 

0.051 

0.058 

0.051 

0.131 

0.022 

Hou 

0.144 

0.113 

0.094 

0.100 

0.113 

0.100 

0.094 

0.000 

0.056 

0.056 

0.081 

0.050 

Minn 

0.122 

0.150 

0.075 

0.116 

0.136 

0.109 

0.054 

0.061 

0.000 

0.102 

0.054 

0.020 

Det 

0.142 

0.175 

0.058 

0.117 

0.083 

0.100 

0.058 

0.075 

0.117 

0.000 

0.025 

0.050 

SF 

0.075 

0.138 

0.113 

0.125 

0.119 

0.138 

0.106 

0.081 

0.050 

0.019 

0.000 

0.038 

Mia 

0.208 

0.177 

0.146 

0.094 

0.094 

0.021 

0.031 

0.083 

0.031 

0.063 

0.052 

0.000 

Total 

1.259 

1.419 

1.073 

1.244 

1.203 

1.022 

0.830 

0.911 

0.853 

0.703 

0.943 

0.540 

Table  3:  Probability  Transition  Matrix  Q 


traffic,  this  is  consistent  with  the  estimate  of  519  million  passengers  per  year  (see 
Table  1). 

The  connection  between  the  macro-level  and  micro-level  models  is  as  follows. 
We  assume  a  well-mixed  population  and  we  randomly  sample  a  group  to  migrate 
from  city  to  city.  Homogeneity  inside  city  populations  is  a  standard  assumption  in 
these  models  (e.g.,  see  [8]).  Over  many  trials,  the  average  of  the  randomly  picked 
group  should  represent  the  properties  of  the  population.  Specifically,  the  arrivals 
deplane,  and  then  a  random  sample  of  the  population  gets  on  the  plane.  Hence, 
at  the  beginning  of  each  flight,  the  proportions  of  susceptible,  asymptomatic,  and 
recovered  (but  not  infected)  people  on  the  plane  are  a  sample  of  the  population  of 
the  city  the  plane  currently  resides  in.  Infected  people  are  assumed  to  be  too  sick 
to  fly.  We  also  do  not  model  the  spread  of  the  disease  within  the  airplane  itself. 


7 


3.  Introducing  Vaccines 

Thus  far  our  model  has  not  included  any  mechanisms  for  containment.  There 
are  many  types  of  containment,  including  vaccines,  anti-viral  medications,  and 
quarantine.  In  this  paper  we  focus  on  vaccines,  and  extend  our  mathematical 
model  to  include  vaccinations. 

We  model  vaccinated  individuals  as  people  with  a  decreased  probability  of  be¬ 
ing  infected  by  an  infected  neighbor.  So  we  effectively  have  two  subpopulations  in 
our  model  -  vaccinated  and  unvaccinated  people.  We  make  a  simplifying  assump¬ 
tion  that  vaccination  has  no  effect  on  the  asymptomatic  and  infected  times.  With 
that  simplifying  assumption,  we  only  need  to  change  the  infection  rate  a.  Let  / 
be  the  proportion  of  the  population  that  is  vaccinated  (0  <  /  <  1).  We  assume  / 
applies  to  each  health  state  equally  -  that  is  /  is  the  proportion  of  the  susceptible 
people  that  are  vaccinated  as  well  as  the  proportion  of  the  asymptomatic  people 
that  have  been  vaccinated,  etc.  Let  /3  be  the  probability  an  unvaccinated  person 
is  infected  by  an  infected  neighbor  and  /3'  <  /3  be  the  probability  a  vaccinated 
person  is  infected  by  an  infected  neighbor.  Therefore,  we  have  two  infection  rates 
-  one  for  the  unvaccinated  (a)  and  the  other  for  the  vaccinated  ( a ')  proportions  of 
the  population, 

a  =  1  —  (1  —  ,j3)C«+0* 

a'  =  l-(l-0')(a+i)g.  (7) 

The  average  probability  of  infection  a  is  given  by 

a  =  fa'  +  (l-f)a.  (8) 

As  before,  we  can  approximate  a  as 

ft«(fl|i)i(/J5'l(l-/)li).  (9) 

Hence,  the  new  differential  equations  model  with  vaccination  is 

=  8'r  —  as 

=  as  —  fia 

=  [la  —  8i 

—  8i—8'r.  (10) 


ds 

dt 

da 

dt 

di 

dt 

dr 

dt 
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The  disease  free  equilibrium  (DFE)  is  the  point  (1,0, 0,0).  The  eigenvalues 
from  the  linearization  about  this  point  is 


A|  =  0 
A2  =  -8' 

^  _  bcQ~(8+n)+^(bcQ-{8+iJ.))2+4(b(o(8+Li)-ii8) 

r,  _  ba~(8+n)-^/(bco-(8+Li))2+4(ba>(S+iJ.)-iJ.S) 

A4  9  5 


(11) 


with  co  —  (// V  +  ( 1  —  /)j3 ) .  The  DFE  has  local  stability  when 


b(fp+(\-mui+8) 

fi8 


< 

N-  .  v  -  / .  /  ^  mj 

1. 

The  other  steady  state  solution  represents  the  endemic  equilibrium,  which  is 
given  by 

Sjj. _ 


Sr  = 


aP  = 


b(s+n)(fp'+(i-m 

1  —Se 

l  +  ft  +  A 


lr.  = 


_  QeH 


rP  = 


Then,  the  replacement  number  R  is 


R  =  i(fp'  +  (i-f)fi)d  +  h 

jd  O 


(12) 


As  before,  when  R  <  1,  the  virus  dies  off.  When  R  >  1,  the  virus  spreads. 

It  is  helpful  to  quantify  the  lower  limit  for  an  effective  vaccination  rate  fe. 
When  R  —  1, 


8ji 


fe  = 


b(8+n) 


-j3 


P'-P 


We  can  define  the  “vaccine  feasibility  region”  as  follows: 

0<fe<f<l- 


(13) 


(14) 


If  fe  <  1,  then  it  is  feasible  to  stop  the  spread  of  the  virus.  Accomplishing  this 
requires  that  a  fraction  fe  <  f  <  1  of  the  population  be  vaccinated.  If  /  <  fe 
or  fe  >  1,  then  the  vaccine  merely  slows  the  spread  of  the  virus.  An  “effective” 
vaccine  can  stop  the  virus  spread  by  causing  herd  immunity  and  the  disease  will 
die  out.  Vaccines  that  have  this  capability  are  in  the  vaccine  feasibility  region. 
Vaccines  not  in  this  region  will  only  lower  the  endemic  threshold  of  the  population 
(the  stable  state  of  infected  individuals). 
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Figure  2  shows  an  example  of  a  theoretically  derived  feasibility  region  for  a 
reasonable  choice  of  parameter  settings.  The  parameter  settings  are  inspired  by  a 
typical  influenza  virus.  With  /i  =  0.35,  the  asymptomatic  time  is  almost  3  days. 
The  period  of  infection  is  roughly  one  week  (5  =  0.15).  Finally,  we  assume  that 
it  takes  roughly  three  years  before  you  become  susceptible  again  (S'  —  0.001). 
The  upper  left  area  above  the  curve  in  black  is  the  region  of  feasible  vaccine 
efficacy,  where  the  efficacy  is  measured  by  /T  (the  probability  an  infective  will 
infect  a  vaccinated  susceptible).  Note  that  even  if  the  vaccine  is  perfect  (/T  =  0), 
then  greater  than  62%  of  the  population  must  be  vaccinated  to  eliminate  the  virus 
spread.  As  the  vaccine  efficacy  decreases  (higher  /3/),  then  that  fraction  increases. 
Once  /3'  >  0.012,  vaccines  will  merely  slow  the  spread  of  the  virus. 


Figure  2:  Theoretical  Vaccine  Feasibility  Region  (b  =  9,  ji=  0.35,  j3  =  0.03,  8  =  0.15,  8'  =  0.001) 

We  compared  the  theoretical  prediction  with  our  simulation.  We  ran  our  sim¬ 
ulation  for  300  days  (using  the  same  parameter  settings),  and  monitored  the  total 
number  of  “sick  days”  (we  do  not  model  births  or  deaths  in  our  simulation).  Fig¬ 
ure  3  summarizes  the  results  of  these  experiments.  This  graph  was  generated 
by  running  the  simulation  with  /  =  0.0  to  /'  =  1.0  in  increments  of  0.01  and 
with  /3/  =  0  to  jS'  =  /3  =  0.03  in  increments  of  0.001.  The  colors  represent  the 
sum  of  sick  days  taken  after  300  days  of  simulation  time  (i.e.  the  sum  of  the 
infected  population  after  each  day  for  300  days).  Inside  the  feasibility  region  the 
virus  resulted  in  less  than  one  million  sick  days.  Outside  of  that  region  the  number 
of  sick  days  increased  enormously.  The  boundary  of  the  region  represents  a  phase 
transition. 
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Experimental  Feasibility  Region 


Figure  3:  Experimental  Vaccine  Feasibility  Region  (b  =  9,  (3  =  0.03,  fl  =  0.35,  8  =  0. 1 5,  8  - 
0.001) 

As  a  brief  implementation  note,  our  simulation  is  written  in  C++,  using  MPI 
to  allow  for  the  use  of  multiple  processors.  The  simulation  was  thoroughly  veri¬ 
fied  and  validated  to  ensure  agreement  with  the  differential  equation  steady-state 
distribution  and  the  Markov  chain  equilibrium  distribution.1  We  used  a  Beowulf 
cluster  with  13  processors.  One  processor  was  used  for  each  of  the  12  cities.  A 
thirteenth  processor  was  used  to  control  the  communication  (the  flights)  between 
each  city.  One  year  of  simulation  time  takes  less  than  one  half  a  minute  of  real 
time. 

4.  Benchmark  Vaccination  Policies 

In  order  to  better  understand  the  effects  of  vaccination  on  our  model,  we  cre¬ 
ated  several  benchmark  vaccination  policies  and  examined  their  performance  over 
several  scenarios.  In  each  scenario  the  virus  starts  as  a  small  population  of  100 
asymptomatic  people  in  one  city.  In  order  to  examine  the  effects  of  city  popula¬ 
tion  size,  we  chose  three  cities  -  Las  Vegas,  Atlanta  and  Los  Angeles.  Las  Vegas 


'in  addition,  a  separate  StarLogo  implementation  was  written,  and  all  results  were  within  1% 
of  the  C++  version 
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is  the  smallest  city,  while  Los  Angeles  is  the  largest.  Atlanta  is  a  city  of  average 
size. 

For  each  city  we  also  examined  three  vaccination  delays  (from  the  day  the 
infection  starts)  of  15,  30,  and  45  days.  These  delays  can  be  considered  to  be  a 
sum  of  two  effects,  (1)  a  delay  in  giving  vaccination  shots,  and  (2)  a  delay  in  how 
long  it  takes  for  the  vaccine  to  take  effect. 

Using  the  influenza  virus  parameters  mentioned  earlier  in  the  paper,  the  pro¬ 
portion  of  the  population  that  must  be  vaccinated  to  ensure  that  the  virus  is  quickly 
eliminated  is  /  >  /<?  ~  92%  (assuming  =  0.01).  Since  the  total  population  size 
is  roughly  55  million  people,  theory  indicates  that  at  least  roughly  51  million  vac¬ 
cines  are  required  to  quickly  eliminate  the  virus.  However,  this  is  unlikely  to  be 
the  case,  so  we  consider  having  the  number  of  vaccines  V  range  from  5  million  to 
55  million,  in  increments  of  5  million.  Hence,  when  there  are  less  than  50  million 
vaccines,  it  is  impossible  to  completely  stop  the  spread  of  the  virus.  Instead,  our 
goal  is  to  allocate  the  vaccines  in  such  a  way  as  to  minimize  the  impact.  Our  mea¬ 
sure  of  “impact”  is  the  total  number  of  sick  days  taken  by  the  population  over  450 
days. 

We  first  considered  two  possible  benchmark  vaccination  policies.  The  “uni¬ 
form”  vaccination  strategy  allocates  l/N  of  the  number  of  vaccines  to  each  of  the 
N  cities.  In  this  model,  N  =  12.  The  “proportional”  vaccination  strategy  allocates 
vaccines  according  to  population  size  (e.g.,  a  city  that  has  twice  as  many  people 
will  receive  twice  as  many  vaccines).  We  also  considered  a  possible  modification 
to  these  two  policies.  Preliminary  runs  indicated  that  fully  vaccinating  the  city  of 
viral  origin  was  important.  This  turned  out  to  always  hold,  so  our  policies  were 
modified  as  follows: 


Name 

Description 

Uniform 

Fully  vaccinate  the  city  of  origin, 
uniformly  distributing  the  rest 
to  the  remaining  cities 

Proportional 

Fully  vaccinate  the  city  of  origin, 
distributing  the  rest  to  the 
remaining  cities  according  to  their 
proportion  total  population  minus 
the  population  of  the  city  of  origin. 

Figure  4:  Two  Benchmark  Vaccine  Allocation  Policies 
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Our  results  were  remarkably  consistent,  regardless  of  the  city  of  origin  and 
the  vaccine  delay.  Hence,  in  this  paper  we  present  only  the  results  for  Atlanta 
with  a  30  day  delay  in  vaccination.  In  Figure  5,  we  see  that  the  proportional  and 
uniform  allocation  policies  are  roughly  equivalent  up  to  20  million  vaccines.2  Be¬ 
tween  25  and  40  million  vaccines,  the  uniform  allocation  policy  was  considerably 
better  than  the  proportional  policy.  Above  40  million  vaccines,  the  two  policies 
are  roughly  equivalent  again.  Overall,  the  uniform  allocation  of  vaccines  was 
definitely  superior. 

This  result  is  counter-intuitive.  There  is  a  two-fold  explanation.  First,  the 
probability  of  infection  depends  on  the  proportion  of  asymptomatic  and  infected 
people.  Hence,  having  the  virus  start  as  a  small  population  of  100  asymptomatic 
people  yields  a  smaller  proportion  in  larger  cities.  One  can  consider  this  to  be 
a  “dilution  factor”  and  explains  why  outbreaks  appear  to  spread  much  faster  in 
very  small  communities.  Second,  the  passengers  on  a  plane  are  a  representative 
sampling  of  the  city  that  the  plane  is  in.  The  odds  of  an  asymptomatic  person 
boarding  the  plane  are  much  higher  in  a  small  city.  Hence,  it  is  actually  better  to 
vaccinate  smaller  cities. 


Atlanta  Proportional  vs.  Uniform  30  Day  Delay 


Figure  5:  Atlanta  Uniform  and  Proportional  Baselines  Comparison  with  30  Day  Delay  (j3  =  0.03, 
p'  =  0.01,  b  =  9,p=  0.35,  8  =  0.15,  8'  =  0.001) 

Given  this  explanation,  we  hypothesized  that  an  “inverse  proportional”  allo¬ 
cation  policy  might  be  even  more  effective  (smaller  cities  get  proportionally  more 


2  All  data  points  are  averaged  over  100  independent  runs. 
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vaccine).  Our  algorithm  for  computing  inverse  proportions  is  as  follows  [23].  Let 
N  be  the  total  population  of  the  cities,  «,  be  the  population  of  city  i,  and  p{  be  the 
proportion  of  the  total  population  for  city  i,  computed  as  p,  =  n,/N. 

Let  pmax  be  the  proportion  of  the  population  of  the  largest  city,  «,•  =  l—pt  (the 
proportion  of  the  population  not  in  city  i),  and  m  be  the  minimum  of  the  values  of 
Ui  (the  proportion  of  the  population  not  in  the  largest  city  or  1  —  pmax)-  Then  let 
wi  =  Ui/m,  and  W  —  Y.)-\  wi-  Then  invj  —  WjW  are  the  inverse  proportions  of  the 
city  populations.  In  this  model, 


invi 


l-  Pi  ^  i  -  pi 

1  Pmax  j—  |  1  Pmax 


(15) 


The  algorithm  is  shown  in  Figure  6. 


Name 

Description 

Inverse 

Proportional 

Fully  vaccinate  the  city  of  origin, 
distributing  the  rest  to  the  remaining 
cities  such  that  each  city  gets  a  vaccine 
supply  inversely  proportional  to  its 
proportion  of  the  population  (smaller 
cities  get  more  than  larger  cities). 

Figure  6:  Inverse  Proportional  Baseline  Vaccine  Allocation  Policy 


Figure  7  shows  how  the  new  inverse  proportional  vaccine  allocation  policy 
compares  with  proportional  and  uniform  vaccine  allocation  policies.  The  inverse 
proportional  policy  is  competitive  with  or  better  than  the  prior  two  policies. 

5.  Evolved  Vaccination  Policies 

Are  there  even  better  vaccination  policies?  We  have  a  well-defined  optimiza¬ 
tion  problem,  namely,  to  allocate  vaccines  such  that  the  number  of  sick  days  is 
minimized.  Hence,  we  turned  to  evolutionary  algorithms  (EAs)  to  automate  our 
search  for  better  vaccination  policies.  EAs  have  had  excellent  success  at  solving 
real-world  problems  such  as  in  forensics  [17],  stock  price  forecasting  [7],  data 
compression  in  wireless  sensor  networks  [22],  path  planning  [35],  and  evolving 
static  output  feedback  controllers  [34].  If  the  EA  could  find  better  policies,  our 
hope  was  that  we  could  analyze  the  policies  to  explain  what  the  EA  had  discov¬ 
ered. 
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Atlanta  Proportional  vs.  Uniform  vs.  Inverse  Proportional  30  Day  Delay 


Millions  of  Vaccines 


Figure  7:  Atlanta  Inverse  Proportional  Policy  Baseline  Comparison  with  30  Day  Delay 

EAs  are  inspired  by  evolutionary  theory,  especially  the  concept  of  “survival 
of  the  fittest”.  The  EA  maintains  a  population  of  individuals.  Each  individual 
represents  a  possible  solution  to  a  problem.  A  fitness  function  converts  the  rep¬ 
resentation  into  a  real  number  that  measures  the  quality  of  the  solution.  The  EA 
takes  a  population  of  solutions  and  evolves  them  to  a  better  fitness  level.  Selec¬ 
tion  serves  to  focus  the  search,  and  exploit  the  knowledge  gained  thus  far.  Genetic 
operators  provide  a  random  component,  allowing  for  exploration  of  the  search 
space.  Patel,  Longini  and  Halloran  [24]  also  use  an  EA  to  evolve  optimal  vaccina¬ 
tion  strategies.  However,  their  focus  is  on  the  optimal  allocation  of  vaccine  to  five 
age  groups:  pre-school,  school,  young  adults,  middle  aged  adults,  and  old  adults. 
Our  focus  is  entirely  different  (although  complementary),  since  we  are  concerned 
with  the  optimal  allocation  of  vaccines  to  geographic  regions  of  the  country. 

Specifically,  we  used  a  (/i,  A)  Evolution  Strategy  (ES)  with  /i  =  10  parents 
creating  A  =  30  children  via  mutation  (described  below).  The  /I  =  10  best  children 
then  constitute  the  population  at  the  next  generation.  The  basic  execution  of  our 
ES  is  the  following: 

1)  Randomly  initialize  the  population  of  /i  parents. 

2)  Evaluate  the  population  using  the  fitness  function. 

3)  Apply  mutation  to  create  A  children. 

4)  Evaluate  the  children  using  the  fitness  function. 

5)  Select  the  /i  best  children  to  form  the  next  population. 

6)  Repeat  3-5  until  some  termination  criteria  are  met. 


15 


In  our  case,  an  individual  represents  the  allocation  of  vaccines  to  each  of  the 
twelve  cities.  We  use  an  integer  vector  of  length  twelve,  v  =  (vi,  V2,  •  •  • ,  V12), 
where  v,-  represents  the  number  of  vaccines  allocated  to  city  i.  The  values  are 
constrained  such  that  the  sum  of  the  vaccines  allocated  to  each  city  equals  the  total 
number  of  available  vaccines  ( V  —  v,).3  We  created  a  “delta-swap”  genetic 

operator  to  mutate  the  individuals.  Two  cities  swap  a  random  amount  of  vaccine. 
This  preserves  the  constraint  that  V  =  Y*)-\  v,-.  We  also  constrain  the  number  of 
vaccines  allocated  to  a  city  to  be  no  greater  than  the  population  size  of  that  city 
(Vf  <  rii). 

The  details  of  “delta-swap”  are  as  follows.  Two  cities  are  chosen  uniformly 
randomly  (however,  the  two  cities  can  not  be  the  same).  Denote  S  as  the  source 
city  (not  to  be  confused  with  the  Susceptible  state  of  health),  while  D  is  the  city 
of  destination.  Let  vs  be  the  amount  of  vaccine  currently  at  city  S,  while  yp  is 
the  amount  of  vaccine  at  city  D.  The  amount  of  vaccine  to  swap,  denoted  Av, 
is  chosen  uniformly  from  U(0,  min(vs,  200000)).  In  other  words,  the  amount 
of  vaccine  to  swap  is  chosen  uniformly  between  0  and  200000,  or  0  and  V5  (if 
vs  <  200000).  Finally,  if  vd  +  Av  <np>  the  swap  occurs  and  Av  vaccine  is  added 
to  D  and  is  subtracted  from  S. 

All  that  remains  is  to  measure  the  “fitness”  of  an  individual.  For  this  appli¬ 
cation,  fitness  is  simply  the  number  of  sick  days  that  occur,  given  the  allocation 
of  vaccines  defined  by  the  individual.  Since  we  are  minimizing,  “better  fitness” 
means  that  we  want  a  lower  number  of  sick  days.  One  complication  is  that  our 
viral  simulation  is  stochastic.  Hence,  the  fitness  must  be  averaged  over  a  num¬ 
ber  of  independent  trials.  If  the  number  is  too  large,  the  EA  is  too  slow.  If  the 
number  is  too  small,  we  can  not  trust  the  results  (due  to  variance).  We  chose  an 
alternative  approach,  based  on  our  work  where  we  evolve  finite-state  machines 
in  non-deterministic  environments  [31,  32].  First  the  EA  uses  a  relatively  small 
number  of  trials  (10)  to  estimate  the  fitness  of  an  individual.  If  the  fitness  is  bet¬ 
ter  than  the  best  fitness  ever  seen,  then  the  EA  re-evaluates  that  individual  over  a 
larger  number  of  trials  (40)  .  This  approach  worked  quite  well.  Our  termination 
criterion  was  100  generations,  after  which  the  best  individual  was  evaluated  over 
100  runs  of  the  simulation  as  with  our  benchmark  policies. 

Figure  8  shows  the  results  for  Atlanta  with  a  30  day  delay  in  introducing  the 
vaccine.  The  results  are  very  impressive.  The  policy  found  by  the  EA  is  clearly 


Although  our  representation  is  fixed  length,  our  motivation  came  from  the  variable  length 
representation  in  the  proportional  genetic  algorithm  as  described  in  [36]. 
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Atlanta  Inverse  Proportional  vs.  EA  30  Day  Delay 


Figure  8:  Atlanta  EA  and  Inverse  Proportional  Policy  Baseline  Comparison  with  30  Day  Delay 
superior  to  our  inverse  proportional  policy. 

6.  A  Better  Vaccination  Policy 

Recall  that  that  EA  policies  are  simply  vectors  of  twelve  integers.  As  such, 
the  reason  for  the  good  performance  is  not  immediately  obvious.  Hence,  we  con¬ 
ducted  a  thorough  analysis  of  a  number  of  good  EA  policies,  and  found  a  number 
of  commonalities. 

First,  as  expected,  the  city  of  origin  always  received  a  large  number  of  vac¬ 
cines.  Second,  (and  also  expected),  the  EA  gave  more  vaccine  to  smaller  cities. 
This  is  in  line  with  our  inverse  proportional  policy.  However,  the  EA  policy  is 
superior  to  the  inverse  proportional  policy.  Why  is  this  the  case?  The  key  is  to 
note  what  information  the  EA  has  implicitly  available.  The  first  commonality  was 
based  on  figuring  out  the  city  of  origin.  The  second  commonality  was  based  on 
the  EA  taking  advantage  of  population  size.4 

However,  there  is  a  third  source  of  important  information  that  we  have  not 
taken  into  account,  namely,  the  probability  transition  matrix  Q.  When  we  took 
this  into  consideration  we  found  that  the  EA  was  not  only  focusing  on  small  cities, 
but  on  cities  that  are  flown  to  most  often!  The  (^-matrix  gives  us  the  likely  next 


4It  is  important  to  clarify  that  the  EA  does  not  reason  at  this  level.  However,  the  EA  is  perfectly 
capable  of  taking  advantage  of  useful  features  and  structures  in  the  search  space.  We  are  providing 
a  human  interpretation  of  these  features  and  structures. 
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location  for  each  plane,  so  it  makes  sense  that  this  information  is  incredibly  im¬ 
portant  in  predicting  where  the  virus  spreads  from  a  certain  city  or  cities  and  thus 
contributes  greatly  to  better  vaccine  allocation  policies. 

In  order  to  mimic  what  the  EA  is  doing,  we  created  an  “importance  factor”. 
This  factor  captures  the  notion  that  the  cities  that  are  important  to  vaccinate  are 
those  that  are  small  and  are  flown  to  most  often.  Let  Sq  be  the  initial  distribution 
of  the  virus.  For  example,  if  the  virus  starts  in  Las  Vegas,  then  our  initial  distri¬ 
bution  is  So  =  [0,0, 0,0, 0,0, 1,0, 0,0, 0,0].  If  the  virus  starts  in  Atlanta  and  Las 
Vegas,  then  So  =  [0.5, 0,0, 0,0, 0,0.5, 0,0, 0,0,0].  Then  Si  =  So  ■  Q  represents 
the  probability  distribution  of  where  the  virus  will  travel,  in  the  next  time  step. 
Let  »,  be  the  population  size  of  city  i,  and  impi  denote  the  importance  of  city  i, 

impi  =  Sij/rii. 

Cities  are  then  vaccinated  in  descending  order  based  on  their  importance. 
However,  is  it  necessary  to  fully  vaccinate  the  important  cities?  We  also  noted 
that  the  EA  tended  to  vaccinate  roughly  92%  of  the  city  population!  Recall,  that 
for  the  experiments  reported  in  this  paper,  theory  indicated  that  at  least  92%  of 
the  population  of  a  city  must  be  vaccinated,  in  order  to  rapidly  eliminate  the  virus 
(fe  ~  92%).  The  EA  clearly  learned  that  providing  more  vaccine  was  wasteful, 
and  that  the  excess  should  be  distributed  to  the  other  cities. 

Figure  9  describes  a  new  vaccine  allocation  policy  based  on  the  city  of  origin, 
the  predicted  importance  of  a  city,  and  the  vaccine  efficacy. 


Name 

Description 

Markov 

Fully  vaccinate  the  city  of  origin,  then 
vaccinate  the  rest  in  order  of  importance 
(those  cities  most  flown  to  with  smaller 
population  sizes)  up  to  the  steady  state 
level  of  vaccination  fe 

Figure  9:  Policy  Combining  ()- Matrix,  Population  Size,  and  Vaccine  Efficacy 

Figure  10  shows  the  results  with  the  new  policy.  It  is  competitive  with  the  EA 
policy,  and  is  better  at  15  million  and  20  million  vaccines.  The  only  weakness  is 
at  5  million  vaccines.  However,  it  is  clear  that  we  have  captured  and  explained 
much  of  the  EA  policy. 
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Atlanta  EA  vs.  Markov  f=0.9167  30  Day  Delay 


Figure  10:  Atlanta  EA  and  Markov  Policy  Baseline  Comparison  with  30  Day  Delay 


7.  Confirmation  of  the  Model 

The  key  to  our  new  vaccination  strategy  is  the  importance  factor  defined  above. 
Although  we  cannot  (currently)  evaluate  the  vaccination  strategy  in  the  real  world, 
we  realized  that  we  can  in  fact  evaluate  the  importance  factor  itself.  As  stated 
above,  the  importance  factor  captures  the  central  concept  that  the  virus  will  have  a 
greater  impact  on  small  cities  that  are  flown  to  most  often.  If  this  importance  factor 
is  correct,  it  should  be  consistent  with  the  spread  of  the  real  influenza  throughout 
the  U.S. 

We  examined  the  spread  of  the  last  four  influenza  seasons,  using  the  CDC 
influenza  maps.  Our  twelve  cities  are  based  in  ten  states  -  Arizona,  California, 
Colorado,  Florida,  Georgia,  Illinois,  Michigan,  Minnesota,  Nevada,  and  Texas. 
We  examined  the  situation  where  the  influenza  became  “widespread”  in  a  state 
(or  states),  and  then  we  used  the  importance  factor  to  predict  where  the  influenza 
would  spread  to  next.  Widespread  influenza  activity  is  defined  as  “Outbreaks  of 
influenza  or  increases  in  ILI  [influenza-like  illness]  cases  and  recent  laboratory- 
confirmed  influenza  in  at  least  half  the  regions  of  the  state.”[6] 

7.1.  2005/2006  Influenza  Season 

In  week  5 1  of  2005,  the  influenza  virus  reached  widespread  levels  in  California 
and  Arizona  (Figure  11).  We  split  the  infected  percentages  between  Arizona  and 
California,  dividing  it  amongst  the  cities  in  those  states.  This  gave  us  an  initial  dis¬ 
tribution  of  Sq  =  [0,0, 0.3, 0,0, 0.3, 0,0, 0,0, 0.3,0].  We  computed  the  importance 
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factors  for  each  city  (state).  Three  states  had  importance  values  much  greater 
than  the  rest  (Texas,  Nevada,  and  Colorado).  In  week  52  of  2005  (Figure  12) 
and  weeks  1  and  2  of  2006,  these  three  states  attain  and  maintain  a  widespread 
influenza  activity  level.  This  particular  season  was  not  especially  virulent. 


Figure  11:  U.S.  Influenza  Activity  Week  Ending  December  24,  2005 


Weekly  Influenza  Activity  Estimates  Reported 
by  State  &  Territorial  Epidemiologists 


fU**t+4 


] 


Figure  12:  U.S.  Influenza  Activity  Week  Ending  December  31,  2005 
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7.2.  2006/2007  Influenza  Season 

These  2005/2006  results  are  pleasing,  but  are  a  bit  weak,  since  the  states  are 
adjacent.  The  2006/2007  influenza  season,  however,  was  more  interesting.  In 
week  49  of  2006,  influenza  was  widespread  in  Florida  (Figure  13).  If  we  use  the 
initial  distribution  So  =  [0,0, 0,0, 0,0, 0,0, 0,0,0, 1],  our  model  predicts  that  five 
states  had  importance  factors  much  greater  than  the  rest  (in  order,  Georgia,  and 
then  to  Colorado,  Texas,  Nevada,  and  Minnesota).  This  influenza  is  slower  to 
spread,  yet  our  predictions  were  surprisingly  accurate. 

In  week  50  of  2006,  we  see  that  Georgia  and  Florida  have  widespread  in¬ 
fluenza  conditions,  which  corresponds  to  the  prediction  that  Atlanta  would  be  next 
(Figure  14).  We  then  have  to  jump  ahead  to  week  4  of  2007,  where  Texas  and  Min¬ 
nesota  have  become  widespread  (Figure  15).  If  we  jump  ahead  another  4  weeks 
to  week  8  of  2007,  we  see  Colorado  is  now  widespread  (Figure  16).  Our  only 
important  error  was  with  respect  to  Nevada,  which  did  not  become  widespread. 
However,  it  is  important  to  note  that  these  states  are  not  adjacent,  so  it  does  seem 
reasonable  to  conclude  that  the  spread  was  via  air  travel. 


Weekly  Influenza  Activity  Estimates  Reported 
by  State  &  Territorial  Epidemiologists 

Week  enOng  December  9  2006  -  Week  49 


Figure  13:  U.S.  Influenza  Activity  Week  Ending  December  9,  2006 


7.3.  2007/2008  Influenza  Season 

In  week  49  of  2007,  the  influenza  was  widespread  in  Texas,  so  we  started 
our  initial  distribution  in  Houston  and  Dallas.  Our  model  predicts  that  the  virus 
should  spread  first  to  Colorado,  and  then  to  Nevada,  Georgia,  Arizona,  Minnesota, 
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Figure  14:  U.S.  Influenza  Activity  Week  Ending  December  16,  2006 

California,  Michigan,  Illinois,  and  Florida  (in  that  order).  This  influenza  was 
especially  virulent  -  every  state  became  widespread. 

In  week  1  of  2008,  Colorado  became  widespread.  Then,  in  week  5  of  2008 
both  Georgia  and  Arizona  were  widespread.  In  week  6  the  influenza  became 
widespread  in  Nevada,  Minnesota,  California,  Michigan,  and  Illinois.  Finally,  in 
week  9  Florida  became  widespread.  Our  only  error  was  again  with  respect  to 
Nevada,  which  became  widespread  after  Georgia  and  Arizona. 

7.4.  2008/2009  Influenza  Season 

In  week  1  of  2009  the  influenza  became  widespread  in  Virginia.  However, 
our  model  does  not  include  any  airports  in  Virginia,  so  we  could  not  begin  there. 
However,  the  second  state  to  become  widespread  was  Texas  in  week  3  of  2009. 
Hence,  we  ran  the  same  model  as  with  the  2007/2008  influenza  season.  This 
season  was  not  as  virulent  -  quite  a  few  states  never  became  widespread. 

Of  the  states  we  model,  Colorado  became  widespread  next,  in  week  4.  Then 
in  week  5  both  Nevada  and  Georgia  became  widespread.  In  week  6  Arizona  and 
Florida  became  widespread.  Finally,  California  became  widespread  in  week  1 1 . 
Minnesota,  Michigan,  and  Illinois  never  became  widespread.  Our  model  has 
a  minor  error  in  that  Minnesota  never  became  widespread,  although  California 
did.  However,  an  investigation  of  the  importance  factors  showed  that  the  num¬ 
bers  for  both  states  were  almost  identical.  Recall  that  the  influenza  really  started 
in  Virginia  -  this  quite  naturally  has  an  affect  on  the  real-world  spread  which 
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Figure  15:  U.S.  Influenza  Activity  Week  Ending  January  27,  2007 


Figure  16:  U.S.  Influenza  Activity  Week  Ending  February  24,  2007 

we  could  not  model.  This  is  reflected  in  our  one  significant  error.  Florida  be¬ 
came  widespread  very  early  in  the  season,  as  opposed  to  very  late.  Again  we 
consider  this  to  be  caused  by  Virginia  (there  are  a  lot  of  flights  from  the  Vir¬ 
ginia/Washington  D.C./Maryland  airports  to  Florida. 
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7.5.  2009/2010  Current  Influenza  Season 

The  CDC  influenza  maps  monitor  “Influenza-like  Illnesses”  (ILI),  but  don’t 
separate  the  different  versions.  Hence  the  surveillance  reports  do  not  or  cannot 
subtype  influenza  A  into  HI,  H3,  and  H1N1.  Due  to  the  longevity  of  H1N1,  and 
the  mixed  data,  it  is  impossible  to  separate  out  the  normal  influenza  strains  in 
order  to  make  predictions. 

7.6.  Quantitative  Confirmation  of  the  Model 

In  this  subsection  we  quantitatively  analyze  the  accuracy  of  our  predictions. 
As  an  illustrative  example,  consider  the  2008/2009  influenza  season.  Colorado 
became  widespread  on  January  31,  both  Nevada  and  Georgia  became  widespread 
on  February  7,  and  both  Arizona  and  Florida  became  widespread  on  February  14. 
Finally,  California  became  widespread  on  March  21.  Since  the  CDC  maps  are 
updated  weekly,  the  result  is  a  partial  ordering  of  r  states:  CO  <  NV  <  GA  < 
AZ  <  FL  <  CA  where  x  <  y  means  x  occurs  temporally  before  y  (note,  we  adopt 
the  USPS  state  abbreviations  for  conciseness).  There  are  four  possible  strict  total 
orderings  resulting  from  the  partial  ordering: 

1.  CO  <NV  <GA<AZ<FL<  CA 

2.  CO<GA<NV<AZ<FL<  CA 

3.  CO  <NV  <GA<FL<AZ<  CA 

4.  CO<GA<NV<FL<AZ<  CA 

As  shown  above,  our  model  uses  the  importance  factors  to  provide  a  strict  total 
ordering  of  p  states.  Because  not  all  states  will  necessarily  become  widespread, 
r  <  p.  For  the  2008/2009  influenza  season  our  model  predicts  the  following  total 
ordering:  CO  <  NV  <  GA  <  AZ  <  MN  <  CA  <  MI  <  IL  <  FL. 

We  now  want  to  create  a  distance  metric  from  the  predicted  ordering  to  the  real 
world  partial  ordering.  Our  distance  metric  is  the  minimum  number  of  “2  swaps” 
required  to  make  the  predicted  ordering  “consistent”  with  the  real  world  partial 
ordering5 .  The  predicted  ordering  is  consistent  if  the  first  r  cities  of  the  predicted 
ordering  match  one  of  the  real  world  total  orderings. 

In  the  case  given  above,  if  MN  and  FL  are  swapped  in  the  predicted  ordering, 
we  get  CO  <  NV  <  GA  <  AZ  <  FL  <  CA  <  MI  <  IL  <  MN.  This  is  consistent 
with  the  real  world  partial  ordering,  so  the  distance  of  the  predicted  ordering  from 


5The  2  swap  operator  randomly  chooses  two  different  states  and  swaps  their  positions. 


24 


Influenza  Season 

Model 

S 

Rand 

Vx 

om  Prec 

Ox 

Actions  X 

skew(X) 

Statii 

Prob(X  <  S) 

sties 

Wilcoxon  p 

2005/2006 

2 

3.34 

0.85 

-0.29 

12% 

<  0.00001 

2006/2007 

2 

3.22 

0.74 

-0.72 

14% 

<  0.00001 

2007/2008 

1 

3.17 

0.75 

-0.56 

2% 

<  0.00001 

2008/2009 

1 

4.23 

1.02 

-0.80 

1% 

<  0.00001 

Table  4:  Quantitative  Analysis 


reality  is  one.  For  the  first  three  influenza  seasons  (2005/2006,  2006/2007,  and 
2007/2008)  the  distance  from  reality  is  two,  two,  and  one,  respectively.  Hence, 
all  predictions  were  close  to  reality,  with  the  last  two  seasons  having  the  best 
predictions.  The  “<5”  column  in  Table  4  summarizes  those  results. 

As  a  baseline  comparison,  we  also  created  100  random  predictions  for  each 
of  the  four  influenza  seasons.  If  our  model  is  correctly  capturing  the  dynamics 
of  the  real  world,  the  predictions  from  the  model  should  be  significantly  better 
than  predictions  provided  by  random  orderings.  The  random  variable  X  denotes 
the  distance  of  the  random  prediction  from  the  real  world.  The  mean,  standard 
deviation  and  skewness  of  X  are  also  shown  in  Table  4,  denoted  by  jdX-  <Jx,  and 
skew(X).  In  all  cases  the  mean  distance  of  the  random  distributions  is  higher  than 
the  prediction  created  by  our  model.  For  the  first  two  influenza  seasons  the  quality 
of  model  prediction  is  roughly  1 .5  standard  deviations  better  than  the  mean.  For 
the  last  two  influenza  seasons  the  quality  of  the  model  prediction  is  roughly  three 
standard  deviations  better  than  the  mean. 

It  is  helpful  to  calculate  the  probability  that  the  random  predictions  could 
match  or  outperform  the  quality  of  the  model  prediction.  Those  probabilities  are 
given  in  Table  4,  under  Prob(X  <  5).  For  all  four  seasons,  the  probabilities  are 
12%,  14%,  2%  and  1%.  Treated  as  a  whole,  it  is  therefore  highly  unlikely  for 
random  predictions  to  perform  nearly  as  well  as  our  model. 

Although  all  results  indicate  that  our  model  is  performing  better  than  random, 
the  results  for  the  last  two  seasons  are  exceptionally  good.  We  believe  the  reason 
for  this  is  that  we  only  created  one  Q  matrix,  reflecting  current  flight  usage.  Flight 
usage  varies  by  year,  and  it  is  quite  reasonable  that  the  current  Q  matrix  would  not 
model  older  seasons  as  accurately. 

For  one  final  test,  we  performed  a  Wilcoxon  signed-rank  test.  This  test  does 
not  assume  any  particular  underlying  distribution,  and  can  be  used  when  the  distri¬ 
bution  is  reasonably  symmetric.  The  values  of  skew(X )  (which  are  low)  indicate 
that  the  Wilcoxon  test  can  be  applied.  For  all  four  seasons,  the  significance  level 
p  is  much  less  than  0.00001,  confirming  again  that  it  is  extremely  unlikely  for  the 


25 


quality  of  random  predictions  to  be  nearly  as  good  as  the  quality  of  the  model 
predictions. 

7.7.  Summary 

Although  we  do  not  attempt  to  predict  the  speed  of  the  viral  spread  (as  ex¬ 
plained  below),  the  predicted  ordering  of  viral  spread  is  surprisingly  accurate, 
given  that  we  are  modeling  flights  from  only  12  cities. 

8.  Conclusion 

We  have  created  a  framework  for  simulating  and  analyzing  weakly  connected 
island  models.  This  framework  consists  of  two  levels.  First,  a  micro-level  de¬ 
scribes  the  behavior  of  individuals  within  an  island.  Second,  the  macro-level  mod¬ 
els  the  travel  of  individuals  between  islands.  The  two  levels  are  largely  separate, 
from  a  functional  point  of  view.  The  micro-level  is  used  to  calculate  the  speed  of 
the  viral  spread.  The  macro-level  is  used  to  calculate  where  the  virus  will  spread. 
The  key  to  the  accurate  modeling  of  the  micro-level  is  to  fit  the  virus  parameters 
to  the  observable  quantities  of  a  particular  virus.  However,  once  this  is  done,  it  is 
extremely  difficult  to  draw  general  conclusions.  This  is  the  reason  why  we  mod¬ 
eled  a  rather  generic  influenza  virus  and  a  small  number  of  airports  -  it  lends  itself 
to  a  reductionist  interpretation  far  more  readily. 

Our  reductionist  approach  serves  as  a  contrast  to  the  work  with  large  epidemi¬ 
ological  “engines”,  where  the  models  are  extremely  difficult  to  analyze.  One  ex¬ 
ample  is  the  work  by  Colizza,  et  al.  [9],  that  uses  the  largest  3,100  airports.6 
However,  the  model  is  then  used  to  simulate  the  spread  of  only  one  influenza 
virus  in  only  nine  “surveillance  regions”  within  the  U.S.,  as  opposed  to  examin¬ 
ing  individual  states.  No  reductionist  interpretation  is  provided. 

We  applied  our  model  to  the  spread  of  a  virus  between  cities  via  air  travel.  This 
model  has  been  verified  and  validated,  by  demonstrating  that  the  steady  state  of 
the  micro-level  (differential  equations)  and  the  macro-level  (Markov)  agree  with 
mathematical  analysis.  We  then  augmented  the  model  to  include  vaccinations  and 
used  an  EA  optimizer  to  generate  good  polices.  We  then  examined  those  policies 
to  explain  and  develop  superior  policies.  It  is  important  to  point  out  that  the  EA 
was  used  both  as  an  optimizer  and  a  tool  for  discovery.  Without  the  EA  it  would 
have  been  extremely  difficult  for  us  to  understand  the  dynamics  of  the  model. 


international  Air  Transport  Association  database  (http://www.iata.org) 
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The  key  to  our  success  is  an  “importance  factor”,  which  states  that  the  cities 
of  importance  (both  from  the  point  of  view  of  viral  spread  and  their  need  for  vac¬ 
cines)  are  those  that  are  small  yet  are  flown  to  most  often.  Smaller  cities  experi¬ 
ence  a  more  rapid  response  to  an  invading  infected  population.  They  also  produce 
infected  individuals  more  quickly  as  a  result.  Cities  that  are  flown  to  more  often 
are  dangerous  because  more  planes  are  carrying  infected  people  to  these  cities. 
This  factor  provides  a  highly  valuable  reductionist  interpretation  of  a  complex 
model. 

We  have  also  demonstrated  that  there  is  a  connection  between  our  importance 
factor  and  the  spread  of  influenza  in  the  real  world  by  comparing  the  previous 
four  influenza  seasons  to  our  importance  factor  predictions.  This  suggests  that  we 
have  captured  a  good  first-order  approximation  of  the  effect  of  air-travel  as  a  virus 
vector  on  the  United  States.  It  is  important  to  note  that  this  success  comes  despite 
the  fact  that  each  influenza  season  involves  a  different  virus  with  different  micro¬ 
level  parameters  that  we  did  not  attempt  to  model.  In  fact,  an  alternative  micro¬ 
level  model  omits  the  Asymptomatic  state  of  health  -  namely,  the  SIR  model.  In 
this  situation  Infected  people  can  fly.  Our  studies  indicate  that,  once  again,  due 
to  the  functional  separation  between  the  micro-level  and  macro-level  models,  this 
change  affects  the  speed  of  the  virus  (as  would  be  expected),  but  not  the  ordering. 

Due  to  the  number  of  studies  we  have  performed  (there  is  insufficient  room  in 
this  paper  to  include  them  all),  we  believe  that  these  observations  are  quite  general 
and  hold  in  a  wide  variety  of  circumstances  [29].  However,  the  nice  aspect  of  our 
framework  is  that  if  conditions  change  radically,  the  EA  can  always  be  invoked  by 
the  user  to  generate  new  policies  automatically. 

In  this  paper  we  developed  the  “importance  factor”  via  human  inspection  of 
the  data  (a  set  of  high  fitness  EA  individuals).  However,  suppose  we  include 
additional  cities  from  other  countries.  Then  there  could  be  significant  changes  in 
b  (due  to  differences  in  population  density  in  different  countries).  Although  our 
simulation  framework  would  have  no  difficulty  with  these  changes,  and  the  EA 
could  still  be  applied,  human  inspection  of  the  resulting  high  fitness  individuals 
will  become  much  more  difficult.  One  of  the  main  challenges  is  that  our  human 
reasoning  involved  more  than  just  inspection  of  the  individuals,  but  required  an 
understanding  of  the  underlying  simulation  model,  including  the  city  of  origin, 
the  size  of  the  populations,  the  proper  interpretation  of  the  probability  transition 
matrix,  and  the  vaccine  feasibility  region.  Hence,  future  work  will  require  the 
addition  of  sophisticated  data  mining  (e.g.,  [19,  15,  14])  and  equation  discovery 
components  [5,  33,  3],  to  automate  the  reductionist  approach  taken  in  this  paper. 
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